Variable Selection With the Strong Heredity Constraint and Its Oracle Property
نویسندگان
چکیده
In this paper, we extend the LASSO method (Tibshirani 1996) for simultaneously fitting a regression model and identifying important interaction terms. Unlike most of the existing variable selection methods, our method automatically enforces the heredity constraint, that is, an interaction term can be included in the model only if the corresponding main terms are also included in the model. Furthermore, we extend our method to generalized linear models, and show that it performs as well as if the true model were given in advance, that is, the oracle property as in Fan and Li (2001) and Fan and Peng (2004). The proof of the oracle property is given in online supplemental materials. Numerical results on both simulation data and real data indicate that our method tends to remove irrelevant variables more effectively and provide better prediction performance than previous work (Yuan, Joseph, and Lin 2007 and Zhao, Rocha, and Yu 2009 as well as the classical LASSO method).
منابع مشابه
A Modified Adaptive Lasso for Identifying Interactions in the Cox Model with the Heredity Constraint.
In many biomedical studies, identifying effects of covariate interactions on survival is a major goal. Important examples are treatment-subgroup interactions in clinical trials, and gene-gene or gene-environment interactions in genomic studies. A common problem when implementing a variable selection algorithm in such settings is the requirement that the model must satisfy the strong heredity co...
متن کاملGroup variable selection via a hierarchical lasso and its oracle property
In many engineering and scientific applications, prediction variables are grouped, for example, in biological applications where assayed genes or proteins can be grouped by biological roles or biological pathways. Common statistical analysis methods such as ANOVA, factor analysis, and functional modeling with basis sets also exhibit natural variable groupings. Existing successful group variable...
متن کاملFactor selection and structural identification in the interaction ANOVA model.
When faced with categorical predictors and a continuous response, the objective of an analysis often consists of two tasks: finding which factors are important and determining which levels of the factors differ significantly from one another. Often times, these tasks are done separately using Analysis of Variance (ANOVA) followed by a post hoc hypothesis testing procedure such as Tukey's Honest...
متن کاملAn improved genetic algorithm for multidimensional optimization of precedence-constrained production planning and scheduling
Integration of production planning and scheduling is a class of problems commonly found in manufacturing industry. This class of problems associated with precedence constraint has been previously modeled and optimized by the authors, in which, it requires a multidimensional optimization at the same time: what to make, how many to make, where to make and the order to make. It is a combinatorial,...
متن کاملHigh-Dimensional Sparse Additive Hazards Regression
High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by modern applications in high-throughput genomic data analysis and credit risk analysis. In this article, we propose a class of regularization methods for simultaneous variable selection and estimation in the additive hazards model, by combining the nonconcave penalized likelihood appr...
متن کامل